AITopics | approximate nash equilibrium

e9bcd1b063077573285ae1a41025f5dc-Paper.pdf

Neural Information Processing SystemsFeb-10-2026, 22:48:14 GMT

P2SROisabletoparallelize PSROwith convergence guarantees bymaintaining ahierarchical pipeline ofreinforcement learning workers, each training against the policies generated by lower levels in the hierarchy.

artificial intelligence, machine learning, reinforcement learning, (17 more...)

Neural Information Processing Systems

Country:

North America > United States > California > Orange County > Irvine (0.05)
Asia > Middle East > Jordan (0.05)
North America > Canada > British Columbia > Metro Vancouver Regional District > Vancouver (0.04)
Europe > Netherlands > North Brabant > Eindhoven (0.04)

Industry: Leisure & Entertainment > Games > Computer Games (0.47)

Technology:

Information Technology > Game Theory (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.73)

Add feedback

XDO: A Double Oracle Algorithm for Extensive-Form Games

Neural Information Processing SystemsDec-24-2025, 21:06:57 GMT

Policy Space Response Oracles (PSRO) is a reinforcement learning (RL) algorithm for two-player zero-sum games that has been empirically shown to find approximate Nash equilibria in large games. Although PSRO is guaranteed to converge to an approximate Nash equilibrium and can handle continuous actions, it may take an exponential number of iterations as the number of information states (infostates) grows. We propose Extensive-Form Double Oracle (XDO), an extensive-form double oracle algorithm for two-player zero-sum games that is guaranteed to converge to an approximate Nash equilibrium linearly in the number of infostates. Unlike PSRO, which mixes best responses at the root of the game, XDO mixes best responses at every infostate. We also introduce Neural XDO (NXDO), where the best response is learned through deep RL. In tabular experiments on Leduc poker, we find that XDO achieves an approximate Nash equilibrium in a number of iterations an order of magnitude smaller than PSRO. Experiments on a modified Leduc poker game and Oshi-Zumo show that tabular XDO achieves a lower exploitability than CFR with the same amount of computation. We also find that NXDO outperforms PSRO and NFSP on a sequential multidimensional continuous-action game. NXDO is the first deep RL method that can find an approximate Nash equilibrium in high-dimensional continuous-action sequential games.

approximate nash equilibrium, double oracle algorithm, name change, (10 more...)

Neural Information Processing Systems

Technology:

Information Technology > Game Theory (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.59)

Add feedback

Pipeline PSRO: A Scalable Approach for Finding Approximate Nash Equilibria in Large Games

Neural Information Processing SystemsDec-24-2025, 20:13:51 GMT

Finding approximate Nash equilibria in zero-sum imperfect-information games is challenging when the number of information states is large. Policy Space Response Oracles (PSRO) is a deep reinforcement learning algorithm grounded in game theory that is guaranteed to converge to an approximate Nash equilibrium. However, PSRO requires training a reinforcement learning policy at each iteration, making it too slow for large games. We show through counterexamples and experiments that DCH and Rectified PSRO, two existing approaches to scaling up PSRO, fail to converge even in small games. We introduce Pipeline PSRO (P2SRO), the first scalable PSRO-based method for finding approximate Nash equilibria in large zero-sum imperfect-information games. P2SRO is able to parallelize PSRO with convergence guarantees by maintaining a hierarchical pipeline of reinforcement learning workers, each training against the policies generated by lower levels in the hierarchy. We show that unlike existing methods, P2SRO converges to an approximate Nash equilibrium, and does so faster as the number of parallel workers increases, across a variety of imperfect information games. We also introduce an open-source environment for Barrage Stratego, a variant of Stratego with an approximate game tree complexity of 10^50. P2SRO is able to achieve state-of-the-art performance on Barrage Stratego and beats all existing bots.

approximate nash equilibria, pipeline psro, scalable approach, (7 more...)

Neural Information Processing Systems

Industry: Leisure & Entertainment > Games (1.00)

Technology:

Information Technology > Game Theory (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback

Pipeline PSRO: A Scalable Approach for Finding Approximate Nash Equilibria in Large Games

Neural Information Processing SystemsAug-17-2025, 02:35:59 GMT

A long-standing goal in artificial intelligence and algorithmic game theory has been to develop a general algorithm which is capable of finding approximate Nash equilibria in large imperfect-information two-player zero-sum games.

approximate nash equilibrium, equilibrium, nash equilibrium, (16 more...)

Neural Information Processing Systems

Country:

North America > United States > California > Orange County > Irvine (0.15)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
Asia > Middle East > Jordan (0.04)
(3 more...)

Industry: Leisure & Entertainment > Games (1.00)

Technology:

Information Technology > Game Theory (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.99)

Add feedback

XDO: A Double Oracle Algorithm for Extensive-Form Games

Neural Information Processing SystemsJan-19-2025, 01:05:13 GMT

Policy Space Response Oracles (PSRO) is a reinforcement learning (RL) algorithm for two-player zero-sum games that has been empirically shown to find approximate Nash equilibria in large games. Although PSRO is guaranteed to converge to an approximate Nash equilibrium and can handle continuous actions, it may take an exponential number of iterations as the number of information states (infostates) grows. We propose Extensive-Form Double Oracle (XDO), an extensive-form double oracle algorithm for two-player zero-sum games that is guaranteed to converge to an approximate Nash equilibrium linearly in the number of infostates. Unlike PSRO, which mixes best responses at the root of the game, XDO mixes best responses at every infostate. We also introduce Neural XDO (NXDO), where the best response is learned through deep RL.

approximate nash equilibrium, double oracle algorithm, extensive-form game, (9 more...)

Neural Information Processing Systems

Technology:

Information Technology > Game Theory (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.62)

Add feedback

Pipeline PSRO: A Scalable Approach for Finding Approximate Nash Equilibria in Large Games

Neural Information Processing SystemsJan-13-2025, 20:16:19 GMT

Finding approximate Nash equilibria in zero-sum imperfect-information games is challenging when the number of information states is large. Policy Space Response Oracles (PSRO) is a deep reinforcement learning algorithm grounded in game theory that is guaranteed to converge to an approximate Nash equilibrium. However, PSRO requires training a reinforcement learning policy at each iteration, making it too slow for large games. We show through counterexamples and experiments that DCH and Rectified PSRO, two existing approaches to scaling up PSRO, fail to converge even in small games. We introduce Pipeline PSRO (P2SRO), the first scalable PSRO-based method for finding approximate Nash equilibria in large zero-sum imperfect-information games.

approximate nash equilibria, pipeline psro, scalable approach, (6 more...)

Neural Information Processing Systems

Industry: Leisure & Entertainment > Games (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback

Learning in Discounted-cost and Average-cost Mean-field Games

Anahtarcı, Berkay, Karıksız, Can Deha, Saldi, Naci

arXiv.org Artificial IntelligenceNov-10-2022

We consider learning approximate Nash equilibria for discrete-time mean-field games with nonlinear stochastic state dynamics subject to both average and discounted costs. To this end, we introduce a mean-field equilibrium (MFE) operator, whose fixed point is a mean-field equilibrium (i.e. equilibrium in the infinite population limit). We first prove that this operator is a contraction, and propose a learning algorithm to compute an approximate mean-field equilibrium by approximating the MFE operator with a random one. Moreover, using the contraction property of the MFE operator, we establish the error analysis of the proposed learning algorithm. We then show that the learned mean-field equilibrium constitutes an approximate Nash equilibrium for finite-agent games.

algorithm, artificial intelligence, machine learning, (18 more...)

arXiv.org Artificial Intelligence

1912.13309

Country:

North America > United States > New York (0.04)
Europe > Middle East > Republic of Türkiye > Istanbul Province > Istanbul (0.04)
Asia > Middle East > Republic of Türkiye > Istanbul Province > Istanbul (0.04)
(4 more...)

Genre: Research Report (0.40)

Technology:

Information Technology > Game Theory (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

Pipeline PSRO: A Scalable Approach for Finding Approximate Nash Equilibria in Large Games

McAleer, Stephen, Lanier, John, Fox, Roy, Baldi, Pierre

arXiv.org Artificial IntelligenceJun-15-2020

Finding approximate Nash equilibria in zero-sum imperfect-information games is challenging when the number of information states is large. Policy Space Response Oracles (PSRO) is a deep reinforcement learning algorithm grounded in game theory that is guaranteed to converge to an approximate Nash equilibrium. However, PSRO requires training a reinforcement learning policy at each iteration, making it too slow for large games. We show through counterexamples and experiments that DCH and Rectified PSRO, two existing approaches to scaling up PSRO, fail to converge even in small games. We introduce Pipeline PSRO (P2SRO), the first scalable general method for finding approximate Nash equilibria in large zero-sum imperfect-information games. P2SRO is able to parallelize PSRO with convergence guarantees by maintaining a hierarchical pipeline of reinforcement learning workers, each training against the policies generated by lower levels in the hierarchy. We show that unlike existing methods, P2SRO converges to an approximate Nash equilibrium, and does so faster as the number of parallel workers increases, across a variety of imperfect information games. We also introduce an open-source environment for Barrage Stratego, a variant of Stratego with an approximate game tree complexity of $10^{50}$. P2SRO is able to achieve state-of-the-art performance on Barrage Stratego and beats all existing bots.

artificial intelligence, machine learning, reinforcement learning, (18 more...)

arXiv.org Artificial Intelligence

2006.08555

Country:

North America > United States > California > Orange County > Irvine (0.15)
Asia > Middle East > Jordan (0.04)
North America > United States > Texas (0.04)
(2 more...)

Genre: Research Report (0.64)

Industry: Leisure & Entertainment > Games > Computer Games (0.46)

Technology:

Information Technology > Game Theory (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback

Monte Carlo Neural Fictitious Self-Play: Approach to Approximate Nash equilibrium of Imperfect-Information Games

Zhang, Li, Wang, Wei, Li, Shijian, Pan, Gang

arXiv.org Artificial IntelligenceApr-6-2019

Researchers on artificial intelligence have achieved human-level intelligence in large-scale perfect-information games, but it is still a challenge to achieve (nearly) optimal results (in other words, an approximate Nash Equilibrium) in large-scale imperfect-information games (i.e. war games, football coach or business strategies). Neural Fictitious Self Play (NFSP) is an effective algorithm for learning approximate Nash equilibrium of imperfect-information games from self-play without prior domain knowledge. However, it relies on Deep Q-Network, which is off-line and is hard to converge in online games with changing opponent strategy, so it can't approach approximate Nash equilibrium in games with large search scale and deep search depth. In this paper, we propose Monte Carlo Neural Fictitious Self Play (MC-NFSP), an algorithm combines Monte Carlo tree search with NFSP, which greatly improves the performance on large-scale zero-sum imperfect-information games. Experimentally, we demonstrate that the proposed Monte Carlo Neural Fictitious Self Play can converge to approximate Nash equilibrium in games with large-scale search depth while the Neural Fictitious Self Play can't. Furthermore, we develop Asynchronous Neural Fictitious Self Play (ANFSP). It use asynchronous and parallel architecture to collect game experience. In experiments, we show that parallel actor-learners have a further accelerated and stabilizing effect on training.

artificial intelligence, machine learning, reinforcement learning, (12 more...)

arXiv.org Artificial Intelligence

1903.09569

Country: North America > United States (0.28)

Genre: Research Report (0.40)

Industry: Leisure & Entertainment > Games > Computer Games (0.67)

Technology:

Information Technology > Game Theory (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.94)
Information Technology > Artificial Intelligence > Representation & Reasoning > Search (0.89)
(2 more...)

Add feedback

Approximate Nash Equilibria with Near Optimal Social Welfare

Czumaj, Artur (University of Warwick) | Fasoulakis, Michail (University of Warwick) | Jurdzinski, Marcin (University of Warwick)

AAAI ConferencesJul-15-2015

It is known that Nash equilibria and approximate Nash equilibria not necessarily optimize social optima of bimatrix games. In this paper, we show that for every fixed ε > 0, every bimatrix game (with values in [0, 1]) has an ε-approximate Nash equilibrium with the total payoff of the players at least a constant factor, (1 − √1 − ε)2, of the optimum. Furthermore, our result can be made algorithmic in the following sense: for every fixed 0 ≤ ε* < ε, if we can find an ε*-approximate Nash equilibrium in polynomial time, then we can find in polynomial time an ε-approximate Nash equilibrium with the total payoff of the players at least a constant factor of the optimum. Our analysis is especially tight in the case when ε ≥ 1/2. In this case, we show that for any bimatrix game there is an ε-approximate Nash equilibrium with constant size support whose social welfare is is at least 2√ε − ε ≥ 0.914 times the optimal social welfare. Furthermore, we demonstrate that our bound for the social welfare is tight, that is, for every ε ≥ 1/2 there is a bimatrix game for which every ε-approximate Nash equilibrium has social welfare at most 2√ε − ε times the optimal social welfare.

equilibrium, nash equilibrium, social welfare, (17 more...)

AAAI Conferences

Twenty-Fourth International Joint Conference on Artificial Intelligence

Country: Europe > United Kingdom (0.04)

Genre: Research Report > New Finding (0.48)

Technology:

Information Technology > Game Theory (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (0.46)

Add feedback

Filters

Collaborating Authors

approximate nash equilibrium

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

e9bcd1b063077573285ae1a41025f5dc-Paper.pdf

XDO: A Double Oracle Algorithm for Extensive-Form Games

Pipeline PSRO: A Scalable Approach for Finding Approximate Nash Equilibria in Large Games

Pipeline PSRO: A Scalable Approach for Finding Approximate Nash Equilibria in Large Games

XDO: A Double Oracle Algorithm for Extensive-Form Games

Pipeline PSRO: A Scalable Approach for Finding Approximate Nash Equilibria in Large Games

Learning in Discounted-cost and Average-cost Mean-field Games

Pipeline PSRO: A Scalable Approach for Finding Approximate Nash Equilibria in Large Games

Monte Carlo Neural Fictitious Self-Play: Approach to Approximate Nash equilibrium of Imperfect-Information Games

Approximate Nash Equilibria with Near Optimal Social Welfare